-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store/cache: fix broken metric and current index cache size handling #873
Conversation
Adding uint64(len(b)) to c.curSize might potentially overflow uint64 if the numbers are big enough and then we might not remove enough items from the LRU to satisfy the request. On the other hand, switching the operands avoids this problem because we check before if uint64(len(b)) is bigger than c.maxSize so subtracting uint64(len(b)) will *never* overflow because we know that it is less or equal to c.maxSize.
pkg/store/cache.go
Outdated
for c.curSize+uint64(len(b)) > c.maxSize { | ||
c.lru.RemoveOldest() | ||
for c.curSize > c.maxSize-uint64(len(b)) { | ||
if _, val, ok := c.lru.RemoveOldest(); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we handle something if that does not ok ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How we could inform the users about such case? AFAICT there is no way ok
can be false
due to the check before-hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm . Isn't currSize updated in "onEvict"?
Ah yes, that's true but it is never increased. Will revert fbaae7c. |
c.curSize is lowered in onEvict.
Add smoke tests for the index cache which check if we set curSize properly, and if removal works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
2 flakes on tests, but now is green: https://circleci.com/gh/improbable-eng/thanos/2353 |
Potpourri of fixes to the cache:
Do not forget to increase the c.current metric with relevant labels when
adding items to the cache. Without this the metric
thanos_store_index_cache_items
is unavailable.Properly adjust c.curSize when adding items to the cache.
Switch the order of operands in a check to avoid a potential uint64 overflow and thus not removing enough items to satisfy our size constraints.
Without these last two fixes essentially the metric
thanos_store_index_cache_items_evicted_total
is always zero together withthanos_store_index_cache_items_overflowed_total
(most of the time). This is because we never enter the loop, and we never increase c.curSize before this.Verification: